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Abstract: Reduze is a computer program for reducing Feynman integrals to master 
integrals employing a variant of Laporta's reduction algorithm. This article describes 
version 2 of the program. New features include the distributed reduction of single topologies 
on multiple processor cores. The parallel reduction of different topologies is supported via 
a modular, load balancing job system. Fast graph and matroid based algorithms allow for 
the identification of equivalent topologies and integrals. 
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1 Introduction 

In perturbative quantum field theory, the traditional method to compute cross sections 
and distributions for a physical process involves generating tree and loop amplitudes via 
Feynman diagrams and interfering them. Simplifications of the expressions are performed 
at the analytical level. Here an essential part is the reduction of the typically dimensionally 
regularized loop integrals [1] to a small number of standard integrals. This step can be 
performed at the amplitude level for tensor integrals or, after contraction of Lorentz indices, 
at the level of interferences for scalar integrals. Considering the case of scalar integrals, 
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integration by parts (IBP) identities [2, 3] and Lorentz invariance (LI) identities [4] may be 
used for a systematic reduction to a set of independent integrals, called master integrals. 
The standard reduction algorithm by Laporta [5] defines an ordering for Feynman integrals, 
generates identities and solves the resulting system of linear equations. Alternative methods 
to exploit IBP and LI identities for reductions have been proposed [6-9], see also [10, 11] and 
references therein. Public implementations of different reduction algorithms are available 
with the computer programs AIR [12], FIRE [13] and the first version of Reduze [14]. 

This article presents the new public reduction program Reduze 2. It is written in C++ 
and represents a major rewrite and extension of its predecessor Reduze. In the following, 
the name Reduze refers to the new version presented here. 

In Reduze, integrals are indexed by integral families ("auxiliary topologies") and sec- 
tors ("topologies") therein. For the reduction, the program implements a fully distributed 
variant of Laporta's algorithm using the Message Passing Interface (MPI). In this way, not 
only different sectors can be reduced in parallel, but also the integrals of a single sector 
can be reduced in a distributed computation. 

The program allows to utilize multiple integral families within a calculation. Spe- 
cial emphasis has been placed on finding relations between sectors of the same or different 
integral families and employing them to eliminate integrals. Besides a straightforward com- 
binatorial matcher, the program implements graph and matroid theory based algorithms 
to compute such relations, taking into account possible crossings of external momenta. 
Similar to the program DIANA [15], Reduze may be used to shift loop momenta of Feynman 
diagrams generated by a program like QGRAF [16] or FeynArts [17] to match sectors of 
integral families. 

Other features include the generation of differential equations for Feynman integrals 
and the computation of (bare) amplitude interferences up to master integrals, starting from 
Feynman diagrams generated by QGRAF. For storing intermediate results of a reduction, 
optionally, the transitional open source database Berkeley DB [18] can be used. 

For the normalization of algebraic coefficients in the identities, one can choose between 
GiNaC [19] and Fermat [20]. Reduction identities and other results can be exported to 
FORM [21], Mathematica [22], and Maple [23] format. Configuration and job files use the 
YAML format [24] and are parsed with the yaml-cpp parser [25]. 

Reduze 2 was used to calculate the two-loop leading color corrections to heavy-quark 
pair production in the gluon fusion channel [26]. Last but not least, Reduze is published 
as open source under the GNU General Public License (GPL) v3 and has no mandatory 
dependencies on proprietary software. 

2 Basic concepts and notations 

2.1 Integral families, sectors, and integrals 

A propagator P is defined as the expression 1 / (q 2 — a) where q is a four-momentum and 
a is constant. The momentum q of a propagator (defined up to a minus sign) is a linear 
combination of loop momenta ki and external momenta pi, and q 2 is a scalar product 
in Minkowski space with the metric convention g = diag(l, —1, —1, —1). In Reduze, also 
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generalized propagators l/(ql — m 2 ) with the scalar product of two different momenta q 
and I are available to support more general irreducible numerators. 

An Z-loop integral family (or "auxiliary topology") F is an ordered set {Pi, . . . , P n } of 
propagators Pi, i = 1, . . . ,n, which is minimal and complete in the sense, that any scalar 
product of a loop momentum fej with a loop momentum kj or an external momentum pj 
can be uniquely expressed as a linear combination of inverse propagators and kinematic 
invariants. Denoting the number of independent external momenta by m, an integral 
family must contain exactly I (I + l)/2 + Im propagators, where the first term counts the 
scalar products between loop momenta only and the second term the products involving 
both loop and external momenta. A new feature of Reduze is its ability to handle several 
integral families simultaneously. 

A selection of t propagators of an integral family defines a sector of this family. As- 
suming a sector has the propagators Pj 1 , . . . , Pj t with {j±, . . . ,j t } C {1, . . . , n}, then its 
identification number is defined as 

t 

ID = ^2 jk - 1 . (2.1) 
k=i 

There are in general (") different t-propagator sectors and ^"=0 it) = ^™ sec t° rs are 
contained in an integral family. Their identification numbers fulfill < ID < 2 n — 1 . 

A sector whose propagators form a subset of the propagators of another sector of the 
same integral family is a subsector of the other sector. 

The purpose of an integral family is to index scalar loop integrals. To every t- 
propagator sector with propagators P^, . . ., Pj t belongs a infinite set of d-dimensionally 
regularized /-loop integrals [1] which all share the same propagators. These integrals have 
the generic form 

/ = / d d h ... I d% PJ 1 . . . PpPr* 1 . . . pr s "-* (2.2) 

with integer exponents rj > 1 and Si > 0. In Reduze such an integral is represented by 

l(F,t,ID,r,s,{ Vl ,...,v n }) (2.3) 

where F denotes the integral family, r = Yl\=i r i — s = Yll=i s i — an d is the 
exponent of propagator Pj. Positive Vi denote powers of regular propagators (non-trivial 
denominator), negative Vi denote powers of inverse propagators (non-trivial numerator), 
and zero means absence of a propagator. The numbers i, r, s as well as the identification 
number ID of the sector, to which the integral belongs, can be calculated from the vector 
v. 

Consider a i-propagator sector of a n-propagator integral family. The number of 
integrals that one can build for certain values of r and s is given by ftf(n, t,r,s) = 
(IZi) • ^ wo binomial factors count all possible ways to arrange the expo- 

nents of the propagators in the denominator and numerator, respectively. 

The integral with r = t and s = of some sector is called corner integral of this sector. 



-3- 



2.2 Integration by parts (IBP) identities 

In dimensional regularization [1] the integral over a total derivative is zero. Let I' be the 
integrand of an integral of the form (2.2). Then, working out the differentiation in 



leads to the integration by parts (IBP) identities [2, 3]. The momentum q is an arbitrary 
loop or external momentum. The index \i is summed over but the index i is not. If there 
are I loop momenta and m independent external momenta one can therefore build I (I + m) 
equations from one integral, the seed integral. 

2.3 Lorentz invariance (LI) identities 

One can also use the Lorentz invariance of the integrals [4]. Taking an integral I(pi, . . . ,p m ) 
the following equation holds 



The derivatives can be shifted directly to the integrand of the integral I. This equation 
can be contracted with all possible antisymmetric combinations of the external momenta, 
e.g. pi fl p2u — PivP2ui which leads to m(m — l)/2 equations where m denotes the number 
of independent external momenta. As it was shown in [10] the Lis do not give new lin- 
early independent equations in addition to the IBPs. However, they can accelerate the 
convergence in a reduction, since in general an LI identity generated from one seed integral 
cannot be reproduced with the IBP identities generated from the same seed integral alone. 
Reduze offers the possibility to use the Lis. 

2.4 Zero sectors 

It is possible that a whole sector is zero which means that all integrals belonging to this 
sector are zero. A sector of an l-loop integral family is trivially zero if it does not allow 
for a selection of / propagator momenta which are independent with respect to the / loop 
momenta. The graph based methods in Reduze, see section 3.1, will automatically detect 
these cases. As a second method, a sector is set to zero if the reduced IBP identities 
generated from the seed integrals of this sector with r = t and s = 0, 1 show that its corner 
integral is zero. 

2.5 Sector relations 

Given a scalar loop integral as well as one or several integral families, suppose it is possible 
to map the integral to a linear combination of indexed integrals of type (2.3). In general, 
such a map is not unique. Ambiguities may arise if sectors from different integral families 
have the same set of propagators or if a transformation of loop and external momenta in 




(2.4) 




(2.5) 
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(2.2) leads to a different linear combination of type (2.3). For the corner integral of some 
sector S written in the form (2.2), consider the transformation of integration variables 



with |detM| = 1. If the new integrand factors can be identified with propagators of a 
sector S', the "shift" transformation (2.6) defines a sector relation between S and S' . In 
this case, any integral in the sector S can be expressed as a linear combination of integrals 
in the sector S' and subsectors of S'. If S and S' are different, such a relation can be used 
to eliminate one of the two sectors completely. The case where S and S' are identical is 
discussed in section 2.6. 

Reduze is able to automatically detect sector relations or handle relations supplied by 
the user. Sector relations will be used to eliminate redundant sectors or integrals, usually 
at the earliest possible stage. As a particularly useful special case, a shift (2.6) might map 
each propagator of an integral family to another propagator of the same integral family. 
This leads to a one-to-one mapping between integrals as well as sectors of the integral 
family. Such a relation can be entered for the full integral family via permutations of 
propagators and allows for particularly efficient removal of redundancies. 

As a generalization of the above described concepts, Reduze also allows for crossings 
of external momenta in addition to (2.6). If the involved crossing leaves the kinematic 
invariants unchanged, the corresponding relations can be directly exploited for relations 
between sectors as described before. In the general case, Reduze also handles relations 
between sectors of integral families defined with crossed kinematics. 

2.6 Sector symmetries 

Special shifts of the loop momenta as in (2.6) which transform a sector to itself are called 
sector symmetries. These shifts are also allowed to contain a permutation of the external 
momenta as long as it does not change the kinematic invariants. Sector symmetries may be 
used to express integrals in terms of other integrals in the same sector and its subsectors. 
These relations may provide information complementary to the IBP and LI identities and 
can be used in the reduction to find a minimal number of master integrals. Reduze is 
capable of automatically determining sector symmetries or handling user supplied rules, 
and offers to exploit them for reductions. 

3 Graph and matroid based algorithms for sectors 
3.1 Physical sectors 

A physical sector is a sector whose propagators correspond to edges in a graph such that 
momentum is conserved. The construction of a graph from a sector, i.e. from the momenta 
of a set of propagators with /-loop momenta, can be done by choosing I propagators which 
have independent momenta with respect to the loop momenta and identifying them as 
edges in a graph with both ends glued together in a single root vertex. External edges 
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Figure 1. Two non- isomorphic graphs which are related by a twist. The matroids of these graphs 
are isomorphic. 

are also attached to this root vertex with one of their ends. Subsequently, for each of 
the remaining propagators, a vertex of the graph (first the root vertex) is cleaved into two 
vertices, and a new edge is inserted between these vertices, such that the edge's momentum 
(determined by momentum conservation) exactly matches the propagator's momentum. 

With this procedure Reduze automatically constructs graphs for sectors where this is 
possible and thus identifies physical sectors. The possibility of having graph representations 
for sectors gives access to fast algorithms for identifying isomorphic graphs and finding 
sector relations and sector symmetries. 

3.2 Sector relations 

If graphs constructed for two different physical sectors are isomorphic, a shift of the form 
(2.6) between the two sets of loop momenta can be derived by identifying the edges of 
the two graphs together with their oriented momentum flow labeling. Reduze offers the 
possibility to find relations between physical sectors by this strategy, allowing also crossings 
of external legs for the graph isomorphism but restricting to cases with | detM| = 1. For 
the graph of each physical sector a standard form of its adjacencies, a canonical label, is 
computed with the algorithm [27]. We distinguish different masses by replacing massive 
edges with a chain of several edges, where the length of such a chain labels a mass uniquely. 

Graph isomorphism takes into account the ambiguities in labeling the nodes of a graph. 
While isomorphic graphs can be described by the same propagators (possibly with a cross- 
ing of external momenta), the inverse is not true. Consider for instance the two vacuum 
graphs in figure 1. The two graphs are non- isomorphic but can be described by the same 
propagators, i.e. by the same sector. Here, a more appropriate object to consider is not 
the graph of the sector, but the associated matroid. Matroids are based on the notion 
of a set of linearly independent sets and may be considered as generalizations of graphs. 
For a graph an associated graph matroid (or cycle matroid) can be defined via its edges. 
Definitions and fundamental properties are given in the review [28] and references therein. 
Essential for us is the following chain of statements. The relevant properties of a vacuum 
graph where all edges share a common mass is encoded in the first Symanzik polynomial 
(U polynomial). For brevity of the argument let us furthermore restrict to biconnected 
graphs. The generalization to arbitrary vacuum components with different masses is rather 
straightforward. An immediate combinatorial approach to isomorphisms of the Symanzik 
polynomials, which is not restricted to vacuum graphs, was presented in [29]. Here, we 
note that the first Symanzik polynomials of two graphs are equal up to a permutation 
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of edge variables exactly if their matroids are isomorphic, see [28] and references therein. 
Two matroids of biconnected vacuum graphs are isomorphic exactly if the two graphs are 
isomorphic up to a series of twists, a statement known as Whitney's theorem [30]. A twist 
operation starts by breaking a graph into two graphs such that identification of separation 
pairs of nodes in both graphs restore the original graph. As the second step of the twist, the 
separation pairs are identified with flipped orientation. In figure 1 the graphs are related 
by a twist around the left-most and right-most nodes and thus have the same matroid. 

These statements can be turned into an algorithm to detect shifts (2.6) between vac- 
uum sectors, which is sketched in the following and implemented in Reduze. We extend the 
graph isomorphism based shift detection by modifying the generated graphs with twists 
such that their canonical labels are minimized. A graph of a physical sector is decom- 
posed into biconnected components with the algorithm [31]. Possible separation pairs of 
biconnected components are identified via a decomposition into triconnected components. 
We implemented the algorithm [32, 33] for this purpose. In order to generate at least one 
representative for each graph isomorphism class, it is necessary to perform twists around 
separation pairs as specified by virtual edges as well as twists which correspond to per- 
mutations of edges within polygon components of the decomposition. While twisting we 
track the edges including their orientations to be able to identify propagator momenta and 
different masses. Graphs with external legs are handled by intermediately joining their 
external nodes into one node and restricting to those twists, which keep the external legs 
joined into one node with their original orientation. 

Alternatively, Reduze also offers a procedure to find all sector relations, which tries 
to identify sets of propagators in a straightforward approach based on linear algebra and 
combinatorics. While this procedure finds all shifts between arbitrary sectors, including 
sectors not corresponding to a graph, it is usually much less efficient than the graph based 
methods. 

3.3 Sector symmetries 

Sector symmetries are shifts of the form (2.6) with map a sector onto itself. Additional 
permutations of the external momenta are permitted as long as the kinematic invariants 
are unchanged. Different such sector symmetries for physical sectors can be found by 
the underlying symmetries of the graph as vertex permutations from the automorphism 
group and permutations of multi-edges. The automorphism group of a graph consists of 
all permutations of the vertices which leave the canonical label of the graph unchanged. 
These transformations are calculated in Reduze with the algorithm [27] and are used to 
derive the mapping of the edges between pairs of vertices and the associated shift of the 
loop momenta. In the case where there is more than one edge between two vertices also 
permutations of the edges with the same mass are considered. 

Alternatively, a complete set of sector symmetries can be calculated by Reduze using 
a combinatorial propagator matching approach. 
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Figure 2. Left: Three blocks of equations for loop integrals Ii with coefficients Cy depending 
on kinematic invariants and the space-time dimension. Each block (shaded rectangle) contains 
equations for the same leading integral (bold face Ii). Right: communication topology for MPI 
processes involved in a distributed reduction. 

3.4 Matching of diagrams to sectors 

As discussed above, the assignment of momenta to propagators of Feynman diagrams is 
ambiguous. If diagrams are generated with a program like QGRAF [16], typically the assigned 
loop momenta have to be shifted in order to index the involved loop integrals via integral 
families. Reduze can compute these shifts for diagram files generated with QGRAF. Reduze 
can also handle permutations of external momenta and find matchings of diagrams to 
crossed sectors. 

4 Distributed reduction algorithm 
4.1 Load balanced system solving 

Reduze computes reductions for Feynman integrals by generating identities for a range of 
seed integrals and reducing this linear system of equations. The seed integrals are usually 
chosen for ranges in the propagator exponent sums r and s. In a typical application, 
reductions for integrals from several sectors are needed. Moreover, a full reduction of a 
specific sector requires in general also the reduction of subsector integrals. Reduze proceeds 
bottom-up: the reduction of a sector is started only after all subsector results are available. 
Sectors which are no subsectors of each other can be reduced independently such that these 
tasks are easily parallelized. This kind of parallelization is available in the first version of 
Reduze via a shell script, which launches programs for different sectors. In version 2 of 
Reduze, two levels of parallelization are implemented via the message-passing-interface 
(MPI) standard. The reduction of a sector becomes a job and several independent such 
reduction jobs can be executed in parallel. The job system is described in more detail in the 
next section. On top of this first level of parallelization, this version of Reduze implements 
a distributed reduction algorithm for the reduction of a single sector. Since in this case 
the parallelization is less obvious we give some details about our implementation in the 
following subsection. 

In Reduze, a total ordering is defined for indexed integrals of the type (2.3). The 
ordering defines integrals of a sector to be more complicated than integrals of its subsectors. 
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In the following, terms like "leading" or "lower" integral refer to this ordering. To choose 
specific integrals as master integrals the user may adjust the ordering, possibly at a later 
stage. In order to reduce integrals of a given sector, IBP and LI identities are generated 
for a specified range of seed integrals. This results in a sparse homogeneous linear system 
of equations for the indexed integrals where the coefficients are rational functions of the 
kinematic invariants and the space-time dimension. The equations are sorted into blocks 
of equations with the same leading integral, see left panel of figure 2. 

The blocks are reduced bottom up, starting from the block with the lowest leading in- 
tegral. For each block, integrals are reduced, i.e. replaced by linear combinations of lower 
integrals, according to the results from lower blocks and subsectors ("back substitution") if 
possible. If a block contains several equations, one is kept and used to eliminate the leading 
integral from all other equations in the block ("forward elimination"), which are subse- 
quently solved for the new leading integral. The coefficients of the integrals are normalized 
such that zeros are detected and numerator and denominator are coprime. This requires 
multivariate polynomial greatest-common-divisor (GCD) computations which typically 
present the most time consuming part of the full calculation. The result of this reduction 
of a block is one equation for the block and possibly further equations with lower leading 
integrals to be inserted into lower blocks. The next block to be selected is the lowest block 
which contains more than one equation or involves integrals which can be reduced. 

Reduze offers both, a purely serial reduction for one sector as well as a distributed 
execution. In the serial version, the above steps are performed in deterministic order on 
one core. The distributed version employs a star topology of MPI processes with one 
manager and one or more workers, see right panel of figure 2. The workers perform the 
actual reduction steps for a block, while the manager keeps track of the complete system 
of equations and balances the work between the workers. More specifically, an idle worker 
contacts the manager to request work. The manager looks up the next block to process and 
sends its equations together with equations needed for back substitutions to the worker. 
The worker reduces the block and sends the result to the manager. 

Our motivation for this distributed algorithm is as follows. Experiments show that in 
typical applications the time needed for the reduction of one block can easily differ by more 
than 6 orders of magnitude. Moreover, the exact order of the individual reduction steps 
significantly determines the execution time for the full reduction and a bottom-up order 
typically shows the best performance. Both issues are directly addressed by the dynamical 
load balancing mechanism presented above, at least for a not -too-large number of worker 
processes. How well this works in practice is quantified in the following subsection. 

Reduze allows to choose between GiNaC [19] and Fermat [20] for the GCD computation 
needed to normalize coefficients. During reduction, the equations are stored either in RAM 
or optionally in a transactional database as implemented by the open source Berkeley 
DB [18]. With transactions turned on, an aborted reduction of a single sector may be 
resumed at a later time; recovering completed reductions for sectors is available in any case. 
Reduze supports different ways to split a calculation into several runs, this is described in 
the tutorial provided with the Reduze distribution. Reductions for crossed integrals are 
automatically be obtained via reduction results from its uncrossed counterpart in order to 
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Figure 3. Execution time for the reduction of single sectors in dependence of the number of worker 
processes employed for the reduction. 



save computation time and disk space. 

4.2 Performance results for single sectors 

Performance results for the reduction of sectors for two-loop contributions to heavy-quark 
pair production are shown in figure 3. These sectors have t = 4 or 5, respectively, and 
reductions were computed for integrals with r = t . . .7 and s = ... 3 or 4, respectively. 
We used a computer with 48 CPU cores operating at 2.1 GHz and started Reduze with 
n workers + 2 MPI processes (one job center process and one manager process should be 
overbooked for a small number of available cores). 

In general, we observe that the scaling with the number of processes is problem specific 
and depends on the configuration of Reduze, such as the chosen computer algebra system. 
The upper two curves in the figure show an example with a good scaling for up to 22 
workers. Indeed, we find examples where the scaling is good up to 48 workers. We think 
this good scaling behavior is noteworthy, given the fact that it describes the distributed 
computation of a not too loosely coupled system. As expected, we observe that beyond 
some number of worker processes the run time decreases less and less with additional 
workers and finally increases for even larger number of workers. Contributions to this 
behavior is expected from serial parts of the code, communication overhead, but potentially 
also from a "less ideal" order of evaluation when solving the system of equations with a 
larger number of workers. The lowest curve was obtained for an example of a reduction 
of a comparably simple system of equations, where the onset of such a behavior is clearly 
visible. It is also not difficult to find examples with worse scaling, where a minimal runtime 
is obtained for only a few worker processes. 

Using Fermat instead of GiNaC for the GCD computations can easily result in a speed- 
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Figure 4. Dynamical load balancing in Reduze 



up by an order of magnitude, see the two blue curves for sector B in the figure. For the 
displayed Fermat benchmarks, the system to be reduced was kept in RAM, while for the 
GiNaC benchmarks a database was used. Our tests indicate that for examples of this type, 
the performance penalty for using a database is considerably less than the differences due 
to the two algebra systems considered here. Nevertheless, also this option is relevant for 
optimal performance, especially for a larger number of workers. Typically, if the coefficients 
in the equations become more involved, the reduction benefits considerably from a larger 
amount of workers compared to cases with compact coefficients. The performance in a real 
application example, involving the reduction of several sectors, is discussed at the end of 
the next section. 

5 Job system 

5.1 Load balanced execution of jobs 

In Reduze, a job represents a sequence of computations which can be performed once its 
dependencies, specified via the presence of input files, are fulfilled. Most jobs in Reduze 
are serial jobs: they are executed on a single core but possibly in parallel to other jobs. 
Parallel execution of different jobs represents the top layer parallelization mechanism of 
Reduze which is automatically available for any job type added to Reduze. Reduction 
of identities is an example for a distributed job. Such a job can be executed by several 
processes in parallel: one process, the manager, is responsible for the full execution of 
the job, other processes, the workers, help for some time with the execution. In order 
to employ this second layer parallelization, a dedicated distributed algorithm needs to be 
implemented for the job. For each run of Reduze, the user specifies a sequence of such 
jobs, which are inserted into a job queue. A job may generate additional auxiliary jobs 
automatically. The job queue is responsible for resolving the dependencies between the 
jobs and determining the next job to be executed. 
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Figure 5. Runtime decrease for the reduction of multiple sectors using a large number of processor 
cores. The reductions cover the sectors of the depicted graphs and all of their subsectors. In the 
graphs, thick lines represent massive propagators. 

If Reduze is started with several MPI processes, one process will act as a job center and 
dynamically balance work between the other processes, the clients of the job center, see 
figure 4. The job center schedules jobs using the job queue and assigns them to clients. An 
idle client contacts the job center and requests work. The job center responds by assigning 
a job to the client, either to be executed as a manager or as a worker. The client changes 
its role accordingly. As a worker, it contacts the responsible manager of the job and helps 
with the job execution. As a manager, it can either execute the job by itself or register as 
an employer at the job center to request workers. 

In order to optimize the efficiency of the parallelization, the job center periodically col- 
lects performance measurements from the employers and estimates an optimal distribution 
of workers based on it. According to this estimate, workers will effectively be reassigned 
to other employers by requesting release of workers from "inefficient" employers and as- 
signing idle customer to "efficient" employers. The basic idea is to avoid low efficiencies 
due to overloaded managers by assigning workers preferably to managers which idle a high 
percentage of their CPU time. 

5.2 Performance results for multiple sectors 

Performance results for the reduction of a selection of sectors are shown in figure 5. These 
sectors are encountered in the calculation of two-loop corrections to heavy-quark pair pro- 
duction. Reductions are calculated for integrals of the depicted sectors and all of their 
subsectors, where r = t . . .7 and s = ... 4. The benchmarks for the solid curve were 
performed on a computer with 48 CPU cores operating at 2.1 GHz using Fermat and keep- 
ing all equations in RAM. The measurements for the dashed curve were obtained using 
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the "Schrodinger" cluster of the University of Zurich with 2.8 GHz cores; these runs were 
configured to use GiNaC and a database. 

The figure shows for this realistic application example that the calculation benefits 
considerably from up to 96 cores, if available. Let us stress again, that the scaling behavior 
is problem specific and may be worse for other types of applications. In the present example, 
the run-time decrease due to additional cores is quite close to an optimal l/n cores behavior 
for smaller numbers of cores. 

6 Other features 

6.1 Differential equations for master integrals 

One method to solve master integrals consists of deriving differential equations by taking 
derivatives in the kinematic invariants, replacing new integrals with the reduction results 
and solving the differential equations by integration. In particular for sectors with several 
independent integrals, a change of basis for the master integrals may be required. Reduze 
offers the possibility to derive differential equations for Feynman integrals and reduce these 
equations for some user choice of master integrals. 

6.2 Interference terms 

Starting from diagrams generated by QGRAF, Reduze can compute scalar interferences of 
(bare) QED or QCD amplitudes in dimensional regularization. This includes insertion of 
user-defined Feynman rules, contraction of Lorentz vector indices, performing Dirac traces, 
and evaluating color structures. In case of an interference of a tree- level diagram with a 
diagram that could be matched to a sector of an integral family (section 3.4) also the 
occurring integrals, which belong to the matched sector and its subsectors, are indexed by 
the corresponding integral family. In a further step, these integrals can be reduced to master 
integrals and substituted correspondingly in the interference terms. Each computation of 
an interference of two diagrams is treated as a job, and when MPI is used, these jobs can 
be performed in parallel. Specific examples are distributed with the Reduze package. 

7 Usage 

The package Reduze can be downloaded from the web page http://projects.hepforge. 
org/reduze . The distribution contains a tutorial with detailed description of installation 
and usage as well as several example files. 
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